Author Profiling using Complementary Second Order Attributes and Stylometric Features
نویسندگان
چکیده
In this paper we present an approach for the task of author profiling. We propose a modular framework, extracting two main group of features, combined with appropriate preprocessing, implementing Support Vector Machines for classification. The two main groups we used were stylometric and discriminative, featuring trigrams on one hand and complementary-weighted Second Order Attributes on the other. We address the problem as a profile based problem creating target profiles and also grouping each user’s tweets in the same document.
منابع مشابه
An Author Profiling Approach Based on Language-dependent Content and Stylometric Features
We describe the approach that we submitted to the 2015 PAN competition [5] for the author profiling task. The task consists in predicting some attributes of an author analyzing a set of his/her Twitter tweets. We consider several sets of stylometric and content features, and different decision algorithms: we use a different combination of features and decision algorithm for each language-attrib...
متن کاملAuthor Profiling using Stylometric and Structural Feature Groupings
In this paper we present an approach for the task of author profiling. We propose a coherent grouping of features combined with appropriate preprocessing steps for each group. The groups we used were stylometric and structural, featuring among others, trigrams and counts of twitter specific characteristics. We address gender and age prediction as a classification task and personality prediction...
متن کاملGrammar Checker Features for Author Identification and Author Profiling Notebook for PAN at CLEF 2013
Our work on author identification and author profiling is based on the question: Can the number and the types of grammatical errors serve as indicators for a specific author or a group of people? In order to detect the grammatical errors we base our approach on the output of the open-source library LanguageTool. In the case of the author identification we transform the problem into a statistica...
متن کاملUsing Textual Transcripts of Parliamentary Interventions for Profiling Portuguese Politicians
This paper presents an experimental study on the subject of profiling political actors through textual transcriptions of their parliamentary interventions. Supervised learning techniques were used to learn models, which attempt to classify Portuguese politicians according to their gender, their age group, or their political affiliation and orientation. Experiments were made using different type...
متن کاملExploring Performance-Based Music Attributes for Stylometric Analysis
Music Information Retrieval (MIR) and modern data mining techniques are applied to identify style markers in midi music for stylometric analysis and author attribution. Over 100 attributes are extracted from a library of 2830 songs then mined using supervised learning data mining techniques. Two attributes are identified that provide high informational gain. These attributes are then used as st...
متن کامل